Release v1.7.4 — external-wait formula + CPU:Elapsed adjustment#260
Release v1.7.4 — external-wait formula + CPU:Elapsed adjustment#260erikdarlingdata merged 1 commit intomainfrom
Conversation
…nt (#215 C6, C7) (#259) C6 — external-wait benefit formula: The existing parallel-wait formula uses per-thread wait = max(0, elapsed - cpu). External / preemptive waits (MEMORY_ALLOCATION_EXT, RESERVED_MEMORY_ALLOCATION_EXT, PREEMPTIVE_*) are CPU-busy in kernel, so elapsed ≈ cpu for those threads and the wait barely shows in that math. Joe's formula: wait_cpu_share = wait_ms / total_cpu_ms sum_max_cpu = Σ max-thread-self-cpu across operators benefit_ms = wait_cpu_share * sum_max_cpu Verified against Joe's worked example on MwX36LEHJS: (26307 / 45075) * (11341 + 42) / 13748 = 48.3% ✓ Implementation: - New IsExternalWait classifier (public so mapping can see it) covering MEMORY_ALLOCATION* and PREEMPTIVE_* prefixes. - OperatorWaitProfile grows MaxThreadCpuMs — per-thread self-CPU (non-cumulative), taken from new PlanAnalyzer.GetOperatorMaxThreadOwnCpuMs which mirrors GetOperatorOwnElapsedMs. Uses self-CPU so summing across operators in a serial plan doesn't double-count cumulative CPU, and sums to approximate critical-path CPU in parallel plans. - ScoreWaitStats routes external waits to CalculateExternalWaitBenefit. - Serial plans with external waits: sum-of-self-cpu ≈ statement cpu, so the formula collapses to wait/elapsed — same as before, no regression. C7 — CPU:Elapsed numerator: External waits inflate the CPU number without representing real query CPU, so the CPU:Elapsed ratio (used in web runtime card, HTML export runtime card, and Avalonia DOP efficiency) subtracts Σ external-wait ms from CPU before dividing by elapsed. - AnalysisResult.QueryTimeResult gains ExternalWaitMs, populated in ResultMapper by summing wait stats matching IsExternalWait. - HtmlExporter.WriteRuntimeCard, Index.razor runtime card, and PlanViewerControl DOP-efficiency calc all use (cpu - externalWaitMs). Version bump 1.7.3 -> 1.7.4. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
erikdarlingdata
left a comment
There was a problem hiding this comment.
Summary
v1.7.4 release: external/preemptive waits (MEMORY_ALLOCATION_*, PREEMPTIVE_*) now get CPU-share benefit scoring instead of the (elapsed - cpu) per-thread math that missed them, and CPU:Elapsed ratios subtract external-wait time from CPU. New QueryTimeResult.ExternalWaitMs field exposes the delta. Adds PlanAnalyzer.GetOperatorMaxThreadOwnCpuMs to compute per-thread self-CPU for the new formula.
What's good
- Targeted fix with a specific math rationale (
wait_cpu_share * Σ max_thread_cpu) and doc-comments explaining the why. CpuTimeMs - ExternalWaitMsis clamped withMath.Max(0, ...)at every call site, so a bad wait-stats parse can't produce a negative ratio.ScoreWaitStatsnow collectsoperatorProfilesunconditionally (not just for parallel), which the new formula needs.
What needs attention
Base branch
PR targets main, not dev. Release PRs or not, that's contrary to the project's branching convention — confirm this is intentional before merging.
Cross-repo sync (PerformanceMonitor)
This touches the plan-analysis rule set (BenefitScorer.ScoreWaitStats, wait classification, external-wait formula) and the PlanAnalyzer API surface (GetOperatorMaxThreadOwnCpuMs). The equivalent change must also land in the PerformanceMonitor Dashboard and Lite copies, or their rule set will drift from Studio's on every external-wait plan. Track separately.
Tests
No changes under tests/PlanViewer.Core.Tests/. Logic added in BenefitScorer.CalculateExternalWaitBenefit, BenefitScorer.IsExternalWait, and PlanAnalyzer.GetOperatorMaxThreadOwnCpuMs has no unit coverage. At minimum:
- A parallel plan with a
MEMORY_ALLOCATION_EXTwait and known per-thread CPU → asserts Joe's 48.3% (test plan references it). - A serial plan with
PREEMPTIVE_*wait → asserts serial branch ofGetOperatorMaxThreadOwnCpuMs. IsExternalWaitpositive/negative cases, including theRESERVED_MEMORY_ALLOCATION_EXTsubstring-match variant.
The PR body's "Verify MwX36LEHJS raw XML when received" box is still unchecked; a captured fixture for that plan under tests/PlanViewer.Core.Tests/Plans/ would be the natural home.
Inline items
See line comments for: (1) triplicated external-wait summation across PlanViewerControl.axaml.cs:2845, ResultMapper.cs:114, and BenefitScorer.CalculateExternalWaitBenefit; (2) IsExternalWait Contains("MEMORY_ALLOCATION") being broader than the documented target and conflicting with ClassifyWaitType's "Memory" bucket; (3) OrderByDescending(...).First() picking a single Parallelism-exchange child nondeterministically in GetOperatorMaxThreadOwnCpuMs; (4) parentByThread overwrite on duplicate ThreadId; (5) fallback-to-known-wrong-ratio when stmtCpuMs <= 0; (6) Output → Services dependency in ResultMapper.
No Avalonia-specific issues (no AvaloniaEdit/ComboBox/:pointerover/brush changes). No contrast regressions.
Generated by Claude Code
| long externalWaitMs = 0; | ||
| foreach (var w in statement.WaitStats) | ||
| if (BenefitScorer.IsExternalWait(w.WaitType)) | ||
| externalWaitMs += w.WaitTimeMs; | ||
| var effectiveCpu = Math.Max(0, statement.QueryTimeStats.CpuTimeMs - externalWaitMs); |
There was a problem hiding this comment.
Third copy of the external-wait summation (also in ResultMapper.cs:114-119 and implicit in BenefitScorer.CalculateExternalWaitBenefit). Three call sites iterate WaitStats, call IsExternalWait, sum WaitTimeMs. Worth pulling into a single helper on PlanStatement (or BenefitScorer.SumExternalWaitMs(stmt)) so the definition of "external wait" has one home — especially if the rule ever expands beyond MEMORY_ALLOCATION / PREEMPTIVE_*.
Generated by Claude Code
| return wt.Contains("MEMORY_ALLOCATION") | ||
| || wt.StartsWith("PREEMPTIVE_"); |
There was a problem hiding this comment.
Contains("MEMORY_ALLOCATION") is broader than the doc-comment's stated target (MEMORY_ALLOCATION_EXT, RESERVED_MEMORY_ALLOCATION_EXT). Any future wait type with that substring will silently be classified as external and get the CPU-share formula even if the worker isn't CPU-busy in kernel. Tightening to StartsWith("MEMORY_ALLOCATION") || StartsWith("RESERVED_MEMORY_ALLOCATION") — or an explicit allowlist — matches the documented intent.
Also note this diverges from ClassifyWaitType at line 550, which still puts MEMORY_ALLOCATION in the "Memory" category. Two classifiers of the same wait that disagree is a future foot-gun.
Generated by Claude Code
| if (stmtCpuMs <= 0 || stmtElapsedMs <= 0) | ||
| return (double)wait.WaitTimeMs / Math.Max(1, stmtElapsedMs) * 100; |
There was a problem hiding this comment.
Early-return path when stmtCpuMs <= 0 falls back to wait_ms / elapsed * 100 — the same math that was already failing to surface these waits (per the method's own doc-comment). If the data is missing, returning 0 would be more honest than a known-wrong number.
Generated by Claude Code
| { | ||
| var childNode = child; | ||
| if (child.PhysicalOp == "Parallelism" && child.Children.Count > 0) | ||
| childNode = child.Children.OrderByDescending(c => c.ActualCPUMs).First(); |
There was a problem hiding this comment.
OrderByDescending(c => c.ActualCPUMs).First() picks a single Parallelism-exchange child and discards the other side(s), so any CPU those threads contributed is dropped from the per-thread subtraction. On ties, .First() is effectively arbitrary (enumeration order) which makes maxSelf non-deterministic for identical plan inputs.
Two questions worth confirming: (1) Is picking the max-CPU child actually the right semantic here, versus summing per-thread CPU across all exchange children keyed by ThreadId? (2) Does this mirror the existing elapsed-path helper GetPerThreadOwnElapsed — if not, the two "critical-path" calculations may diverge.
Generated by Claude Code
| var parentByThread = new Dictionary<int, long>(); | ||
| foreach (var ts in node.PerThreadStats) | ||
| parentByThread[ts.ThreadId] = ts.ActualCPUMs; |
There was a problem hiding this comment.
parentByThread[ts.ThreadId] = ts.ActualCPUMs — if PerThreadStats ever contains two entries for the same ThreadId (e.g., thread reuse across iterations, or parser quirks on some plan shapes), the second silently overwrites the first. Worth a TryGetValue + sum like childSumByThread uses a few lines below, for symmetry and safety.
Generated by Claude Code
| @@ -508,7 +507,15 @@ private static void ScoreWaitStats(PlanStatement stmt) | |||
| var category = ClassifyWaitType(wait.WaitType); | |||
| double benefitPct; | |||
There was a problem hiding this comment.
Gate IsExternalWait(...) && operatorProfiles.Count > 0: for an external wait on a plan without a RootNode (estimated-only plan that somehow has wait stats, or parse failures), this falls through to the Parallelism branch test and then to !isParallel || operatorProfiles.Count == 0 → simple ratio. That's the pre-fix behavior, which the doc-comments say is wrong for these waits. Likely fine in practice, but the fallback chain for external waits in degraded data is worth a single explicit branch.
Generated by Claude Code
| @@ -1,4 +1,5 @@ | |||
| using PlanViewer.Core.Models; | |||
| using PlanViewer.Core.Services; | |||
There was a problem hiding this comment.
Output layer now depends on PlanViewer.Core.Services to call BenefitScorer.IsExternalWait. Fine as long as you're OK with Output → Services, but the classification is really a model concern — a PlanStatement.ExternalWaitMs property (or a WaitStatInfo.IsExternal computed once during parse) would let both Output and the Avalonia control consume the value without reaching into Services.
Generated by Claude Code
|
|
||
| /// <summary> | ||
| /// Sum of external/preemptive wait time (MEMORY_ALLOCATION_*, PREEMPTIVE_*) — | ||
| /// these waits are CPU-busy in kernel and inflate CpuTimeMs vs real query CPU. | ||
| /// Subtract from CpuTimeMs for a truer CPU:Elapsed ratio. | ||
| /// </summary> | ||
| [JsonPropertyName("external_wait_ms")] | ||
| public long ExternalWaitMs { get; set; } |
There was a problem hiding this comment.
New JSON field external_wait_ms is a public output contract — MCP clients, downstream consumers, and the Dashboard/Lite copies of PerformanceMonitor will start seeing this. Worth mentioning in release notes and confirming any consumer that computes its own CPU:Elapsed ratio (e.g., dashboards) gets the same treatment.
Generated by Claude Code
Summary
Joe feedback #215 C6 + C7:
See PR #259.
Test plan
🤖 Generated with Claude Code